Clustering and Visualization of High-dimensional Biological Datasets Using a Fast Hma Approximation

نویسندگان

  • GUNJAN K. GUPTA
  • ALEXANDER Y. LIU
  • JOYDEEP GHOSH
چکیده

In this paper, we reintroduce Hierarchical Mode Analysis(HMA), which was first proposed in 1968, as a powerful clustering algorithm for bioinformatics. The ability of HMA to find a compact hierarchy of a small number of dense clusters is very important in many bioinformatics problems (for example, when clustering genes in a set of gene-expression microarrays, where only a small number of genes related to the experimental context cluster well, while the rest need to be pruned). We also present two major improvements on HMA: a faster approximation algorithm, and a novel 2-D visualization scheme for high-dimensional datasets. These two improvements make HMA a powerful and promising new tool for many large, high-dimensional clustering problems in bioinformatics. We present empirical results on the Gasch dataset showing the effectiveness of our framework.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

High-Dimensional Unsupervised Active Learning Method

In this work, a hierarchical ensemble of projected clustering algorithm for high-dimensional data is proposed. The basic concept of the algorithm is based on the active learning method (ALM) which is a fuzzy learning scheme, inspired by some behavioral features of human brain functionality. High-dimensional unsupervised active learning method (HUALM) is a clustering algorithm which blurs the da...

متن کامل

Modification of the Fast Global K-means Using a Fuzzy Relation with Application in Microarray Data Analysis

Recognizing genes with distinctive expression levels can help in prevention, diagnosis and treatment of the diseases at the genomic level. In this paper, fast Global k-means (fast GKM) is developed for clustering the gene expression datasets. Fast GKM is a significant improvement of the k-means clustering method. It is an incremental clustering method which starts with one cluster. Iteratively ...

متن کامل

Assessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories

In recent years, the tremendous and increasing growth of spatial trajectory data and the necessity of processing and extraction of useful information and meaningful patterns have led to the fact that many researchers have been attracted to the field of spatio-temporal trajectory clustering. The process and analysis of these trajectories have resulted in the extraction of useful information whic...

متن کامل

Using Graphs for Fast Error Term Approximation of Time-varying Datasets

Session 1: Time-Varying, High Dimensional and Spatial Visualization Chair (Bill Ribarsky) 11h00-11h30 Using Graphs for Fast Error Term Approximation of Time-varying Datasets Christof Nuber, Eric C. LaMar, Valerio Pascucci, Bernd Hamann, Kenneth I. Joy 11h30-12h00 Visual Hierarchical Dimension Reduction for Exploration of High Dimensional Datasets Jing Yang, Matthew O. Ward, Elke A. Rundensteine...

متن کامل

The BioDICE Taverna plugin for clustering and visualization of biological data: a workflow for molecular compounds exploration

Background: In many experimental pipelines, clustering of multidimensional biological datasets is used to detect hidden structures in unlabelled input data. Taverna is a popular workflow management system that is used to design and execute scientific workflows and aid in silico experimentation. The availability of fast unsupervised methods for clustering and visualization in the Taverna platfor...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006